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Preface 


This  study  was  accomplished  under  work  unit  77191819,  Voice  Spectral  Analysis 
as  a  Measure  of  Stress  in  Air  Combat.  This  work  unit  is  part  of  the  Laboratory 
program  on  Personnel  Qualifications  which  supports  the  research  thrust  area. 
Force  Acquisition  and  Distribution  Systems.  This  particular  work  was  a  feasi¬ 
bility  study  exploring  a  potentially  useful  new  technology.  The  primary 
reason  for  documenting  this  effort  is  to  provide  a  medium  for 
lessons  learned.  The  lack  of  success  of  this  research  might  be 
attributable  to  two  possible  causes:  (1)  an  inappropriate  R&D 
approach,  or  (2)  the  non-existence  of  a  consistent  and  measurable 
change  in  a  person's  voice  under  conditions  of  stress. 
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INTRODUCTION 


In  human  languages,  the  change  of  the  fundamental  frequency  (Fo) 
over  time  contributes  both  linguistic  and  paral inguistic  information  to 
the  total  message  articulated.  In  many  languages,  among  which  Chinese 
is  the  best  known  example,  controlled  changes  in  fundamental  frequency 
are  phonemic  and  used  for  linguistic  purposes.  The  same  vowels  and  con¬ 
sonants  signify  different  words  when  different  Fo  patterns  are  employed; 
therefore,  the  lexical  meaning  of  the  word  depends  upon  the  type  of  F0 
contour  it  contains.  In  English,  Fo  changes  are  said  to  be  nonphonemic, 
since  the  lexical  meaning  of  a  word  cannot  be  altered  by  a  change  in  the 
F0  contour  over  the  course  of  the  word.  Nonetheless,  the  type  of  infor¬ 
mation  conveyed  by  the  fundamental  frequency  in  English  is  far  from 
unimportant.  In  conjunction  with  the  cues  provided  by  intensity  and 
duration,  the  changes  of  F0  in  English  conveys  to  the  listener  whether 
a  statement  is  being  made  or  a  question  is  being  asked;  which  syllable 
in  a  word  is  being  stressed  or  which  word  in  a  sentence  is  being  empha¬ 
sized;  and,  whether  an  utterance  reflects  surprise,  dismay,  assuredness 
or  shock  on  the  part  of  the  speaker.  It  is  the  acoustic  correlates  of 
these  prosodic  and  suprasegmental  features  of  speech  that  give  some  indi¬ 
cation  of  the  emotional  state  of  the  individual.  Williams  and  Stevens 
(1972)  suggest  that  the  F0  of  the  speech  signal  versus  time  appears  to 
be  the  clearest  indicator  of  the  emotional  state  of  the  speaker. 

Numerous  researchers  (Hecker,  1971;  Lynch,  1934;  Fairbanks  and 
Provonost,  1939;  Fairbanks  and  Hoaglin,  1941;  Lieberman,  1961;  Lieberman 
and  Michaels,  1962;  Fonagy  and  Magdies,  1963;  Uldall,  1960)  have  inves¬ 
tigated  the  relationship  between  speech  and  the  artificial  simulation  of 
emotions.  Only  a  few  studies  (Skinner,  1935;  Huttar,  1968)  have  used 
normal  speech  in  their  experiments.  All  of  the  above  studies,  both  real 
and  simulated,  focused  on  connected  speech.  Hollien,  et  al.  (1973)  used 
sustained  phonation  rather  than  connected  speech  in  an  attempt  to  deter¬ 
mine  whether  the  reported  changes  in  cycle-to-cycle  variation  were  due 
to  (1)  involuntary,  inherent  phonatory  variation  (jitter),  or  (2)  volun¬ 
tary  and/or  learned  inflectional  speech  patterns.  Results  suggested  that 
the  degree  of  laryngeal  jitter  increased  as  a  function  of  the  phonated 
frequency.  The  jitter  factors  of  0. 5-1.0  were  considered  average  limits 
for  sustained  phonation  by  normal  males.  A  study  by  Beckett  (1969),  also 
based  on  normal  male  subjects  sustained  phonations,  investigated  the 
relationship  of  pitch  perturbation  to  three  levels  of  vocal  constriction. 
Results  indicated  that  the  measure  of  pitch  perturbation  was  a  function 
of  subjective  vocal  constriction.  Utilizing  synthetic  vowels,  Rozsypal 
and  Miller  (1979)  applied  a  multi -dimensional  scaling  technique  in  the 
analysis  of  jitter  and  shimmer.  Their  experiment  indicated  that  (1)  some 
jitter  is  necessary  for  sustained  vowels  to  be  perceived  as  natural, 

(2)  the  vowel  sound  determines  the  optimal  amount  of  jitter,  and  (3)  the 
shimmer  effect  is  equal  for  all  vowels  and  less  pronounced  than  that  of 
jitter. 

At  present,  an  area  receiving  much  attention  in  voice  analysis  of 
stress  Is  the  muscle  microtremor  phenomena.  The  microtremor,  also  referred 
to  as  involuntary  voice  tremor,  involuntary  frequency  modulations  (FM), 
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speech  tremor,  and  pitch  perturbations  is  stirring  professional  interest 
in  the  fields  of  acoustic  phonetics,  aviation,  law  enforcement,  and  psy¬ 
chiatry. 

It  has  been  hypothesized  that  the  voice  microtremor  is  related  to 
the  phenomenon  of  physiological  tremor  which  was  discovered  many  years 
ago  to  be  a  normal  accompaniment  of  a  voluntary  muscle  activity.  Lippold 
(1971)  noted  that  the  normal  contraction  of  a  voluntary  muscle  is  accom¬ 
panied  by  tremors  of  the  muscles  which  take  the  form  of  minute  oscilla¬ 
tions  which  are  diminished  with  excitation  of  the  muscle  source.  The 
frequency  characteristics  of  these  oscillations,  which  occur  between  8 
and  12  cycles  per  second,  were  isolated  by  Halliday  and  Redfearn  through 
the  use  of  Fourier  analysis  (Edson,  1976).  The  application  of  these 
research  findings  were  not  utilized  in  voice  stress  analysis  until  the 
development  of  the  Psychological  Stress  Evaluator  (PSE),  a  deception 
detection  instrument,  by  Bell,  Ford  and  McQuistin  (1972). 

The  popularity  of  this  voice  analyzing  equipment  (Dektor,  1971)  with 
law  enforcement  agencies,  psychiatric  clinics,  private  investigators,  etc., 
is  based  upon  the  manufacturer's  claim  that  the  PSE  discerns  a  physiolo¬ 
gical  tremor  of  the  voice  mechanism  which  is  present  in  a  relaxed  emotional 
state,  and  disappears  with  psychological  stress.  Furthermore,  the  invol¬ 
untary  vocal  tract  tremor,  which  is  superimposed  upon  the  Fo ,  is  under 
the  control  of  the  central  nervous  system  until  it  is  suppressed  by  the 
autonomic  nervous  system  which  gains  dominance  in  a  stress  situation. 

Some  reports  question  the  validity  of  the  PSE  and  indicate  that  it 
only  works  under  acute  or  high  stress  conditions  (McGlone,  et  al.,  1974; 
Papcun,  1974;  Lambert,  1974).  However,  studies  by  Smith  (1977)  and  Eden 
and  Inbar  (1975,  1976,  1978)  support  the  PSE  as  an  instrument  capable  of 
measuring  anxiety.  Smith's  study  indicated  that  "stress  blocking"  of  the 
voice  patterns  appeared  where  it  was  expected  to  appear,  however,  more 
accurate  and  objective  scoring  systems  were  needed.  Inbar  and  Eden's 
three-part  study  (1975,  1976,  1978)  confirms  the  statements  by  the  PSE 
proponents  that  the  central  nervous  system  is  the  source  of  the  vocal 
tract  tremor.  In  the  first  part  of  the  study  (1975)  electromyogram  ( EMG) 
correlates  of  the  PSE  were  sought  through  the  utilization  of  two  methods: 

(1)  transcutaneous  stimulation  of  the  vocal  tract  muscles  by  external 
surface  electrodes  to  verify  the  ability  of  muscle  tension  changes  to 
generate  correlated  voice  tremor,  and  (2)  a  throat  microphone  to  detect 
tremor  type  vibrations  in  the  pitch  waveform.  Positive  results  were 
obtained  from  the  first  method.  The  second  method  revealed  tremor  vibra¬ 
tions  detected  in  the  first  formant  of  regular  speech  were  also  found  in 
the  pitch  waveform.  In  the  second  part  of  the  study  (1976) ,  the  hypo¬ 
thesis  that  the  frequency  changes  in  speech  are  controlled  by  the  central 
nervous  system  was  investigated.  In  this  experiment,  surface  EMG  recordings 
were  used  to  estimate  changes  in  the  tension  of  muscles  in  the  vocal  area. 
Results  indicated  that  voice  tremor  can  be  produced  in  two  ways:  (1)  by 
mechanical  subresonances  in  the  vocal  cords  or  vocal  tract,  and  (2)  by 
signals  generated  by  the  central  nervous  system.  Cross-correlation 
results  Indicated  that  the  voice  tremor  is  produced  by  the  central  nervous 
system.  The  evidence  to  support  this  conclusion  is  that  the  oscillations 
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were  random  in  nature,  and  always  preceded  by  voice  tremor  by  approximately 
the  same  amount  of  time  for  a  particular  vowel.  Because  of  this  finding, 

EMG  tremor  could  not  originate  from  muscle  spindle  afferent  reflex  signals 
activated  by  mechanical  sources,  but  only  by  central  nervous  system  acti¬ 
vation.  In  the  final  segment  of  Inbar  and  Eden's  study  (1978),  two 
theoretical  aspects  were  tested:  (1)  the  influence  of  pitch  period  varia¬ 
tions  on  frequency  changes  resulting  from  the  resonant  characteristics  of 
the  vocal  tract;  and  (2)  the  vocal  system's  physiological  parameters  which 
are  potentially  able  to  govern  involuntary  frequency  changes.  Results 
indicated  that  activity  of  both  the  vocal  cords  and  the  vocal  tract  can 
produce  frequency  variation  in  the  human  voice. 

In  summary,  in  preparation  for  this  effort,  an  extensive  literature 
review  documented  studies  relating  to  the  varying  emotional  states  of  the 
voice  and  the  acoustical  correlates  of  the  prosodic  features  of  the  voice: 
intensity,  fundamental  frequency  variation  (microtremor)  and  spectrum 
patterns  of  intonation.  It  was  decided  to  concentrate  on  the  vocal 
involuntary  microtremor  even  though  not  all  of  the  literature  agreed  with 
the  existence  of  the  phenomenon  or  a  correlation  with  stress.  Using 
microprocessor  technology,  which  offers  the  only  hope  of  real-time  analysis 
capability,  a  program  was  proposed  to  perform  autocorrelations  on  taped 
voices  from  operational  settings  estimated  to  be  stressful  for  the  operator. 
The  hypothesis  was  that,  if  measurable,  the  microtremor  would  vary  in  the 
degree  of  presence  in  some  consistent  relationship  to  the  estimated  level 
of  stress  prevailing  upon  the  operator. 

The  decision  to  go  directly  to  real  life  voiced  output  tapes  for 
analysis  was  based  on  accepting  a  hypothesis  that  as  stress  increases,  the 
vocal  musculature  microtremor,  which  is  superimposed  on  F0  as  FM,  changes 
with  stress  in  some  discernible  relationship  that  can  be  detected  and 
analyzed  with  instrumentation.  There  are  ethical  problems  with  creating 
high  levels  of  stress  in  the  laboratory  and  there  are  questions  about  the 
validity  of  simulated  emotions,  even  as  performed  by  skilled  actors.  Taped 
voice  outputs  from  aircrews  in  life-threatening,  high  workload  conditions 
were  thought  to  have  potential  as  good  source  materials  for  these  studies. 

To  this  end,  the  USAF  provided  tapes  of  aircrews  engaged  in  SEA  air  combat. 
Other  live  voiced  outputs  of  pilots  having  inflight  difficulties  were 
obtained  from  the  San  Antonio  Air  Traffic  Control  facility.  The  lack  of 
controlled  variables  and  the  individuality  of  each  case  reduces  the 
methodological  strength  of  such  studies;  however,  the  undisputed  realism 
of  the  operational  setting  probably  justified  the  use  of  such  voiced 
outputs. 


METHODS 

The  original  plan  to  use  government -supplied  recorded  voiced  outputs 
from  aircrews  In  combat  was  altered  because  the  government-supplied  recor¬ 
dings  proved  to  be  unsatisfactory  for  analysis.  The  tapes  finally  selected 
for  test  were  of  general  aviation  pilots  having  weather  problems  and 
receiving  guidance  from  the  San  Antonio  Air  Traffic  Control  Tower.  The 
difficulties  with  the  combat  tapes  were  mainly  ones  of  excessive  background 
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noise  and  frequently  unintelligible  speech.  Also,  voiced  outputs  during 
combat  engagements  are  usually  very  cryptic  and  rarely  are  of  sufficient 
continuous  speech  duration  for  analysis,  necessitating  considerable 
splicing  to  eliminate  pauses  and  to  connect  speech  segments.  Five 
seconds  of  continuous  speech  was  selected  as  an  appropriate  tradeoff 
between  the  size  of  the  statistical  sample  and  time  resolution  in  tracking 
stress  change,  i.e.,  the  FM  analysis  was  performed  every  one-half  second, 
which  gave  9  or  10  analyses  on  which  to  do  statistics  (means  and  standard 
deviations  of  each  FM  component). 

The  computer-based  FM  extraction  system  implemented  for  this  study 
relied  on  waveform  digitization  as  the  data  reduction  technique.  Data 
digitial ization  enhances  the  tests  for  intra-  and  inter-individual  relia¬ 
bility,  and  for  comparison  with  other  stress  data.  In  the  waveform 
digitization  approach,  the  FFT  spectral  analysis  algorithm  underlying 
the  computational  process  was  applied  to  the  digitized  representation  of 
the  waveform.  The  procedure  is  based  upon  an  autocorrelation  technique. 

The  front  end  portion  of  the  system  processed  the  speech  by  contin¬ 
ually  extracting  the  FM  of  the  fundamental  (F0).  The  statistics  portion 
computed  a  mean,  square  root,  and  standard  deviation  of  the  FM  over  the 
duration  of  the  speech  sample  within  the  parameters  of  the  5  to  15  Hz 
frequency  spectrum.  The  software  also  provided  for  printout  of  the 
statistical  procedures  carried  out. 


The  Equipment 

The  equipment  used  in  the  voice  stress  project  consists  of  (Fig.  1): 

(1)  a  DEC  POP  11/34  computer  running  RT-11  software  (GFE ) ; 

(2)  a  Computer  Design  and  Applications  MSP-3X  array  processor 
installed  within  the  11/34; 

(3)  a  Krohn-Hite  multimode  filter  model  3750; 

(4)  a  Sharp  RD-667  cassette  deck;  and 

(5)  a  Wollensak,  model  2820  AV,  heavy  duty  cassette  tape  recorder. 

The  Sharp  was  selected  as  being  adequate  to  preserve  the  original 
quality  of  the  flight  recordings.  The  Wollensak  was  to  produce  a  third 
generation  tape  of  42  speech  samples. 

The  Krohn-Hite  is  used  as  an  adjustable  filter  and  primarily  used 
to  prevent  aliasing,  which  occurs  when  the  original  signal  contains  fre¬ 
quencies  above  one-half  the  sample  rate  (see  any  text  on  applying  the 
discrete  Fourier  transform).  The  filter  allows  selectable  rolloff  rates 
(6,  12,  18  and  24  dB  per  octave)  and  for  maximum  anti-aliasing  the  highest 
roll  rate  was  selected.  The  normal  setup  is  60  to  3000  Hz  passband  with 
maximum  rolloff  outside  this  range  (24  dB  per  octave). 

The  PDP  11/34  contains  a  12-bit  (4096  counts)  analog-to-digital 
converter.  An  existing  and  well  tested  software  program  developed  pre¬ 
viously  by  Technology  Incorporated  was  adapted  to  operate  the  converter 
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FIGURE  1.  Voice  Stress  Assessment  System  for 

Processing  and  Analyzing  Voice  Output  Data 
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in  this  application.  The  sampling  rate  can  be  varied  from  16  Hz  to 
26,000  Hz.  A  5000  Hz  sampling  rate  was  chosen  largely  by  trial  and  error. 
The  compromise  was  between  waveform  (speech)  fidelity  and  disk  space  to 
hold  the  digitized  speech.  Another  consideration  was  that  for  male 
speakers,  there  should  not  be  appreciable  harmonics  above  2500  Hz.  A 
higher  sampling  rate,  therefore,  reduces  aliasing  problems  and  improves 
waveform  fidelity,  and,  further,  increases  disk  space  on  which  the  digi¬ 
tized  data  are  saved  prior  to  further  processing. 

The  MSP-3X  is  a  low-cost  array  processor  suited  to  performing  Fast 
Fourier  Transforms  (FFTs).  It  was  installed  within  the  PDP  11/34  along 
with  associated  software  provided  by  the  manufacturer.  This  software 
allows  the  MSP-3X  to  be  utilized  from  a  FORTRAN  program.  The  MSP-3X  is 
capable  of  performing  a  256-point  autocorrelation  in  under  10  msec.  This 
was  felt  to  be  fast  enough  to  do  voice  stress  analysis  in  real  time,  once 
the  analysis  algorithm  was  fully  specified. 

The  autocorrelation  is  done  via: 

(1)  a  forward  FFT  of  zero-padded  raw  sampled  data  (zero  padding  is 
the  usual  approach  to  doing  an  autocorrelation  via  the  FFT,  i.e.,  128 
samples  of  speech  are  padded  with  128  zeros  and  a  256-point  FFT  taken); 

(2)  taking  the  power  (sum  of  the  squares  of  the  reals  and  imagin- 
aries);  and 

(3)  an  inverse  FFT  of  the  power  yielding  a  sampled  autocorrelation 
waveform. 


The  Software 

The  software  developed  by  Technology  Incorporated  consists  of  an 
interactive  package  with  three  parts  (data  collection,  real  time  analysis, 
offline  analysis),  and  a  program  to  generate  test  or  synthesic  frequency 
modulated  (FM)  waveforms  (Appendix  I).  Programming  of  the  real  time 
analysis  component  was  not  accomplished.  The  scenario  originally  envi¬ 
sioned  was  to  analyze  digitized  data  offline  until  a  technique  was  found 
which  would  measure  stress.  With  a  validated  technique  in  hand,  the 
array  processor  could  then  be  programmed  to  analyze  speech  in  real  time 
(i.e.,  as  the  speech  was  being  digitized).  However,  offline  analysis  as 
of  this  point  in  time  has  not  yielded  a  valid  technique  to  implement. 

The  system,  as  currently  designed  and  operated  has  the  capability 
to  analyze  data  in  real  time,  assuming  a  valid  technique  can  be  developed. 
From  the  standpoint  of  timing,  an  autocorrelation  rate  of  128  per  second 
gives  8  msec  per  autocorrelation,  which  is  adequate  for  real  time: 


Load  array  processor  with  128  samples  .46  msec 

Zero  padded  FFT  1.40  msec 

Take  power  1.00  msec 

Inverse  FFT  1.48  msec 

Find  peak  of  autocorrelation  .20  msec 


8 


It  is  assumed  that  the  above  steps  would  represent  the  major  compu¬ 
tation  to  be  accomplished  for  a  validated  technique. 

Further,  the  tradeoff  is  between  increasing  the  number  of  cycles  in 
order  to  get  a  better  autocorrelation  and  being  able  to  track  changes  in 
the  fundamental.  Consider:  50  msec  is  l/20th  of  a  second.  If  the  funda¬ 
mental  is  200  Hz,  this  gives  10  cycles  of  the  fundamental  for  analysis, 
which  is  enough  for  a  reasonably  good  autocorrelation.  However,  if  the 
fundamental  has  20  Hz  FM,  then  l/20th  of  a  second  of  data  will  have  only 
one  complete  cycle  of  FM,  which  the  autocorrelation  will  not  be  able  to 
measure. 

The  package  consists  of  a  driver,  prompting  sections,  data  collec¬ 
tion,  and  data  analysis  sections  all  written  in  FORTRAN.  Subroutine 
libraries  utilized  were  the  FORTRAN  library,  the  System  library,  a  pre¬ 
viously  developed  machine  language  data  collection  routine,  and  the  array 
processor  library.  To  use  the  package,  one  starts  the  driver  which  prompts 
for  which  of  the  three  sections  to  use: 

(1)  The  data  collection  section  prompts  for  sampling  rate,  sampling 
duration  in  seconds,  buffer  size,  channel  number,  and  resulting  disk  file 
name. 

(2)  After  validating  the  replies,  it  pauses  to  allow  the  operator 
to  prepare  the  equipment  and  start  the  tape  deck. 

(3)  Upon  giving  the  computer  a  start  signal,  the  voice  signal  is 
digitized.  Afterwards,  control  returns  to  the  driver  allowing  the  user 
to  select  the  next  operation. 

The  analysis  section  prompts  for  about  20  parameters  governing  data 
analysis.  Some  of  the  major  prompts  are: 

(1)  Starting  the  stopping  points  (in  seconds)  within  a  data  file. 

(2)  Autocorrelation  rate  (usually  100  or  128  per  second  of  voice 
tape  data). 

(3)  Autocorrelation  size  (this  corresponds  to  the  number  of  msec 
of  data  covered  by  the  autocorrelation),  "256”  corresponding  to  50  msec 
of  data  is  the  usual  reply. 

(4)  The  range  of  frequencies  within  which  to  search  for  the  auto¬ 
correlation  peak  (typically  30  to  250  Hz  for  male  speech). 

The  remaining  prompts  govern  the  nature  and  format  of  the  results: 

(5)  The  rate  at  which  the  FM  analysis  is  done  (usually  twice  per 
second). 

(6)  The  range  of  FM  frequencies  to  be  covered  (usually  5-15  Hz). 

(7)  Whether  the  autocorrelation  periods  are  to  be  printed  or 
displayed. 

(8)  Whether  the  FM  results  are  to  be  printed  or  displayed. 

(9)  Whether  the  first  autocorrelation  waveform  is  to  be  printed 
or  displayed. 
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After  prompting  is  finished  and  validated,  the  computer  outputs 
the  complete  set  of  parameters  (Tables  1-2,  Appendix  II).  See  text  in 
section  2.3  for  further  explanation. 

The  data  analysis  software  was  checked  by  performing  the  analysis 
upon  data  representing  a  sine  wave  with  a  controlled  amount  of  FM.  The 
analysis  works  as  expected  upon  such  data. 


The  Analysis  Technique 

Autocorrelations  are  computed  over  a  span  of  collected  data  in  an 
equally  distributed  fashion  to  yield  a  specific  autocorrelation  rate. 

For  a  given  autocorrelation  the  appropriate  section  of  collected  digitized 
voice  data  is  loaded  into  the  array  processor.  A  zero  padded  real  forward 
FFT  is  performed.  The  results  are  multiplied  times  the  complex  conjugate 
of  itself  yielding  the  "power"  or  sum  of  squares  of  the  reals  and  imagin- 
aries.  This  occurs  in  the  frequency  domain.  The  result  is  inverse  FFT 
transformed  back  into  the  time  domain  yielding  a  sampled  autocorrelation 
waveform.  (For  those  not  familiar  with  frequency  domain  analysis,  suggest: 
Applications  of  Digital  Signal  Processing.  1978,  Prentice-Hall,  Alan  V. 
Oppenheim,  Ed.). 

The  autocorrelation  waveform  is  searched  over  a  range  corresponding 
to  a  fundamental  frequency  range  for  a  peak.  The  location  of  the  peak, 
the  value  of  the  peak,  and  its  neighbors  are  unloaded  from  the  array 
processor  so  that  an  interpolated  peak  may  be  found.  The  location  of  the 
peak  corresponds  to  the  period  of  the  fundamental  of  the  voiced  speech. 

An  interpolation  is  done  to  improve  the  accuracy  of  the  period.  A  3-point 
or  parabolic  interpolation  is  done.  An  interpolated  peak  occurring  outside 
the  search  limits  is  bounded,  the  final  interpolated  period  being  that  of 
the  appropriate  limit. 

The  autocorrelation  peak  was  assumed  to  be  a  Gaussian  or  bell -shaped 
peak.  From  three  samples  near  the  peak  the  exact  location  of  the  peak 
can  be  determined.  Since  the  top  of  the  peak  approximates  a  parabola,  the 
corresponding  peak  calculation  assuming  a  parabola  can  be  done  if  the 
three  points  encompass  the  peak  (one  point  to  one  side  and  two  points  to 
the  other  side  of  the  exact  peak)  and  If  the  peak  is  quite  broad.  The 
original  intent  was  to  implement  parabolic  interpolation  first  and  then 
implement  Gaussian  interpolation  if  the  peaks  were  found  to  be  quite 
narrow.  The  peaks  were  found  to  be  quite  broad.  The  only  difference 
between  fitting  a  parabola  and  a  Gaussian  Is  taking  logarithms  of  the 
three  sampled  amplitudes.  It  was  originally  thought  necessary  to  make 
the  pitch  determination  as  accurately  as  possible  and  Interpolation  was 
the  way  to  do  this. 

The  interpolated  periods  are  Input  for  the  FM  analysis,  i.e..  If 
there  is  FM  In  the  original  voice  signal,  it  will  result  in  an  up-and- 
down  motion  of  the  periods.  The  frequency  of  these  up-and-down's  will 
be  the  frequency  of  the  FM.  Thus,  a  Fourier  analysis  of  the  periods  will 
extract  the  FM  components.  The  periods  are  loaded  into  the  array  processor 
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a  section  at  a  time,  corresponding  to  the  FM  analysis  rate  and  size.  An 
ordinary  forward  real  FFT  is  done  and  the  power  computed  over  the  FM 
frequency  range  of  interest.  These  powers  are  unloaded  from  the  array 
processor  to  the  PDP  11/34.  They  are  normalized  by  their  sun  so  that 
the  fraction  of  power  in  a  given  frequency  with  respect  to  the  total  power 
over  the  frequency  band  of  interest  is  printed.  A  moment  is  computed  to 
give  an  indication  of  the  relative  location  of  the  majority  of  the  power. 
The  mean  and  standard  deviation  of  each  frequency  column  is  also  computed 
for  a  givenanalysis  run. 

The  analysis  runs  at  about  1/4  real  time  with  no  effort  to  optimize 
the  performance  of  the  analysis  software.  Data  collection  runs  in  real 
time.  There  are  no  means  to  play  back  collected  data  (i.e.,  resynthesize 
the  voice  sound  track  from  the  disk  files).  No  special  treatment  of 
noise,  blank  tape,  or  unvoiced  speech  was  done.  The  averaging  properties 
of  the  autocorrelation  technique  and  the  FM  analysis  are  relied  upon  to 
reduce  the  effect  of  these  factors.  However,  long  blank  sections  of  the 
tape  were  eliminated  by  dubbing  them  out. 


Procedures 

Twenty  taped  voice  outputs  of  aircraft  operators  under  various  levels 
of  inherent  stress  were  obtained  from  civilian  and  military  sources.  The 
tapes  were  indexed  and  assessed  as  to  aircraft  operator  stress  levels 
(determined  through  subjective  analysis)  and  audio  quality. 

Recordings  obtained  from  the  Air  Traffic  Control  facility  at  the 
San  Antonio  International  Airport  were  selected  for  analysis  based  upon 
their  audio  clarity,  amount  of  displayed  stress  and  the  number  of  contin¬ 
uous  speech  samples.  Three  recordings  were  analyzed:  two  separate 
instances  of  aircraft  operators  lost  in  weather  and  one  Air  Traffic  Con¬ 
troller  assisting  one  of  the  pilots.  In  addition,  two  male  voices  in  a 
non-stress  environment  were  recorded  and  analyzed. 

A  Wollensak,  model  2820  AV,  heavy  duty  cassette  tape  recorder  and 
a  Sharp  Educator,  model  RD-665  AV,  cassette  recorder  were  used  in  producing 
a  third  generation  tape  of  42  speech  samples.  The  speech  segments  were 
sequentially  ordered  beginning  with  the  operator's  request  for  assistance 
to  the  tape's  conclusion,  the  pilot's  affirmed  safety.  Tapes  were  dubbed 
so  that  only  one  voice  was  recorded  per  tape.  Silent  spaces,  pauses, 
etc.,  were  dubbed  out  so  that  no  more  than  one  second  of  silence  separated 
speech  segments. 


RESULTS 

The  data  revealed  the  presence  of  FM  (microtremor  not  confirmed) 
and  its  shift  over  the  fundamental  frequency  (Fo).  The  failure  to  confirm 
microtremor  was  due  to  the  presence  of  unavoidable  noise  on  the  tapes. 

The  nature  of  digital  signal  processing  is  that  for  noisy  signals,  the 
results  show  a  broad  peak  where  in  the  ideal  case,  there  would  be  a  single 
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value.  In  the  case  of  the  microtremor,  the  processing  occurs  in  several 
steps. 


The  first  step  is  the  sampling  and  conversion  of  the  speech  signal. 
The  Nyquist  criteria  requires  that  the  sampling  rate  be  at  least  twice 
the  highest  frequency  of  interest  (in  this  case,  the  highest  order  har¬ 
monic  of  the  speech  pitch). 

Having  sampled  and  digitized  the  signal  at  a  sufficient  rate  to 
encompass  the  desired  harmonics,  pitch  determination  is  done.  The  effect 
of  noise  in  the  original  signal  is  to  introduce  greater  variability  in 
the  computed  pitch  (i.e.,  a  less  accurate  pitch).  The  software  actually 
computes  the  period  which  is  the  reciprocal  of  pitch.  Parabolic  inter¬ 
polation  was  an  effort  to  improve  the  speech  period  determination.  In 
the  presence  of  noise,  the  parabola  is  flatter  and  interpolation  is  less 
accurate.  At  a  sampling  rate  of  5  kHz,  uninterpolated  period  accuracy 
would  be  200  msec.  The  signal -to-noise  ratios  in  the  voice  tapes  are 
such  that  even  this  accuracy  is  unobtainable.  Assuming  a  200  Hz  pitch, 
or  equivalently  a  5000  usee  period,  would,  without  noise,  yield  a  1:25 
accuracy.  For  those  parts  of  the  voice  tape  that  gave  pitch  results,  a 
1:10  accuracy  appears  more  reasonable. 

The  last  stage  of  the  analysis  is  the  FM  determination  of  speech 
periods.  Assuming  a  150  Hz  fundamental  with  10%  accuracy,  we  get  a  15  Hz 
FM  band  due  to  noise.  This  noise  in  the  FM  band  of  interest,  which  is 
added  to  the  FM  due  to  microtremor,  tends  to  disguise  the  microtremor 
under  these  conditions.  The  hypothesis  that  the  microtremor  present  in 
the  frequency  range  of  5-12  Hz  will  vary  consistently  with  changing  levels 
of  stress,  could  not  be  tested,  since  the  microtremor  could  not  be  con¬ 
firmed  using  this  technique.  The  results  are  confirmed  to  reporting  the 
peak  pitch  periods  that  were  brought  out  in  the  autocorrelation. 

Tables  1  and  2  portrayed  the  format  for  data  analysis  display.  The 
analysis  program  delivered  two  printouts  for  each  analysis  segment.  The 
first  listing  is  that  of  all  the  autocorrelations  for  each  30-sec  segment 
of  analyzed  vocalizations.  The  second  printout,  the  FM  analysis,  is  a 
listing  of  the  peak  period  means  for  each  frequency  bin  between  5-11  Hz. 

The  bin  containing  the  most  FM  (highest  mean)  is  considered  the  frequency 
of  interest  for  that  particular  speech  segment. 

The  pilots  and  controller  vocalizations  are  printed  in  tables  in 
order  of  analysis  so  that  speech  segments  and  times  of  the  peak  mean  can 
be  seen  together.  Each  table  Is  explained  in  the  text  that  follows. 

For  the  purpose  of  making  comparisons  between  vocalizations  obtained 
under  real  life,  stressed  conditions  and  those  obtained  in  unstressed 
familiar  surroundings,  vocalizations  were  obtained  from  two  adult  male 
subjects  during  a  relaxed  recording  session.  Subject  No.  1  was  an  audio¬ 
visual  professional  with  speech  training,  while  Subject  No.  2  had  no  formal 
speech  training.  A  summary  of  the  analysis  of  two  30-sec  speech  segments 
for  each  subject  is  presented  In  Table  3.  As  can  be  seen  in  this  table, 
both  speakers  have  their  FM  peak  means  in  the  5  Hz  frequency  bin,  which 
differs  considerably  from  the  airborne  and  controller  responses. 
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The  peak  means  for  the  distributed  FM  for  the  pilot  of  Cessna  30- 
Golf  are  shown  in  Table  4.  The  highest  peak  means  occur  during  initial 
contact  with  the  ground  control  (Thrush  Control),  a  period  when  anxiety 
would  be  expected  to  be  high. 

By  analysis,  the  initial  peak  mean  is  in  the  7  Hz  frequency  bin 
for  the  first  30  sec  of  vocalization  and  in  the  6  Hz  frequency  bin  for 
the  second  30-sec  period  of  analysis.  For  the  next  six  analysis  segments, 
the  peak  means  all  fall  in  the  5  Hz  frequency  bin.  The  peak  means  rise 
again  to  7  Hz  in  analysis  segments  nine  and  ten  (240-299.06  sec)  where 
the  pilot  is  speaking  with  considerable  emphasis,  however,  the  effect  of 
a  background  voice  on  the  analysis  segments  is  unknown. 

The  peak  means  from  the  distributed  FM  for  the  pilot  of  Cessna  48- 
Golf  are  shown  in  Table  5.  Unlike  the  30-Golf  tape,  the  peak  FM  means 
remain  at  or  above  the  7  Hz  level  throughout  the  taped  vocalizations. 

Only  in  the  eighth  analysis  segment  (210-239.06  sec)  does  the  peak  mean 
fall  to  6  Hz;  however,  in. the  final  analysis  segment,  it  falls  to  a  5  Hz 
frequency  bin,  which  may  or  may  not  be  a  reliable  value  due  to  interfer¬ 
ence.  The  frequency  bins  with  very  low  values  (i.e. ,  430-439.38  sec) 
can  be  discounted,  probably  due  to  vocalizations  too  short  in  duration, 
or  no  vocalization  at  all. 

When  comparing  30-Golf  and  48-Gold  peak  means,  48-Golf  has  peak 
means  that  are  consistently  in  higher  frequency  bins  than  30-Golf,  however, 
in  the  absence  of  baseline  recordings  for  the  two  pilots,  nothing  defini¬ 
tive  can  be  said  about  these  differences.  On  the  basis  of  subjective 
impression  of  the  vocalizations,  if  a  higher  peak  mean  is  indicative  of 
higher  emotional  levels,  then  30-Golf  should  have  registered  peak  means 
in  higher  frequency  bins  than  the  analysis  showed,  since  the  vocaliza¬ 
tions  sounded  more  stressed  than  the  numerical  values  would  seem  to  indi¬ 
cate,  assuming,  that  is,  that  peak  means  of  around  5  Hz  represent  relatively 
unstressed  vocalizations. 

The  peak  means  for  the  distributed  FM  for  the  Air  Traffic  Controller 
working  30-Golf  shows  more  variability  than  either  of  the  two  pilot  tapes 
(Table  6).  For  the  first  3-1/2  minutes  of  tape  analysis,  the  mean  peak 
FM  is  located  in  frequency  bin  5  Hz,  but  then  it  rises  to  the  6  Hz  bin 
for  one  analysis  segment  before  falling  back  to  the  5  Hz  bin.  This 
fluctuating  pattern  continues  until  the  18th  analysis  segment  (930-959.06 
sec)  where  the  peak  FM  rises  to  the  7  Hz  bin,  falls  back  to  6  Hz  for  one 
segment,  then  spikes  to  the  9  Hz  level  before  finally  settling  back  to 
the  5  Hz  frequency  bin  for  the  rest  of  the  recording.  There  is  no  accoun¬ 
ting  for  this  range  of  variability,  except  on  a  subjective  evaluation  of 
the  emotional  state  of  the  controller  based  on  his  vocal  output. 

The  fluctuation  of  FM  In  the  5-9  Hz  range  certainly  indicates  that 
a  measurable  change  is  occurring  that  might  relate  to  some  stress  effects. 
Again,  in  the  absence  of  baseline  data  on  the  individuals,  it  would  be 
very  difficult  to  quantify.  Future  studies  could  examine  voices  from 
operational  settings,  but  it  would  be  desirable  to  have  previous  laboratory 
data  on  the  same  individuals. 
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The  same  mean  peak  FM  data  were  graphed  as  FM  in  Hz  over  time  and 
as  FM  normalized  (percentage  of  one)  over  time.  The  data  for  the  pilot 
of  30-Golf  (Fig.  2)  clearly  show  the  same  relationship  as  was  depicted 
in  Table  4  for  the  FM  charted  over  time.  The  normalized  data  for  30-Golf 
produces  a  far  more  graphic  picture  of  the  variability  of  the  FM  over 
time  (Fig.  3). 

For  48-Golf,  the  same  relationship  holds  in  the  mean  peak  FM  over 
time  (Fig.  4)  and  in  the  normalized  data  (Fig.  5).  The  variability,  both 
as  a  percentage  and  as  peak  FM  frequency  change  is  even  more  evident  than 
in  the  other  pilot.  Unfortunately,  no  baseline  data  exist  for  either 
pilot,  so  the  change  from  normal,  unstressed  vocalizations  cannot  be 
determined. 

The  Air  Traffic  Controller  vocalization  data  make  a  very  clear 
picture  of  the  variability  of  the  peaked  means  of  the  FM  (Figs.  6  and  7). 
Despite  the  obvious  and  dramatic  variability  in  the  Controller's  FM  dis¬ 
tribution,  its  correlation  with  a  level  of  stress  or  emotional  status  is 
purely  subjective,  as  it  was  for  the  pilot  data. 


DISCUSSION 

The  vocalizations  analyzed  in  this  effort  were  collected  from  real- 
world  stressor  situations.  Although  the  data  are  compared  between  pilots, 
controllers  and  unstressed  male  speakers,  it  is  felt  that  the  useful 
value  should  have  been  a  comparison  between  the  speakers  and  their  own 
baseline. 

The  failure  to  find  microtremor  was  due  to  noise  levels  in  the  tapes 
that  could  not  be  economically  filtered  or  averaged  out  using  the  capability 
and  techniques  reported  here.  This  made  testing  the  hypothesis  impossible; 
however,  it  was  possible  to  continue  to  extract  pitch  information.  This 
provided  some  information  about  differences  between  speakers  and  further 
confirmed  the  hardware  and  software  capability. 

The  data  collected  here  does  tend  to  support  findings  that  as  emo¬ 
tional  stress  mounts,  the  speech  signal  has  a  tendency  to  distortion  and 
displacement  into  the  higher  frequencies.  Popov  (1971)  found  this  to  be 
true  as  he  reports  on  his  Russian  work.  There  is,  unquestionably,  a 
difference  between  speakers.  The  two  unstressed,  male  speakers  showed 
no  tendency  toward  peak  mean  FM  distributions  as  was  seen  in  the  pilot 
and  controller  data.  Between  pilot  differences  are  apparent,  as  are 
differences  between  either  pilot  and  the  controller,  at  least  as  far  as 
the  degree  of  variability  in  FM  peak  mean  distributions. 

The  significance  of  the  variability  and  its  degree  is  less  clear. 

Given  the  extreme  range  of  individual  responses  to  stress,  it  does  not 
seem  possible  to  make  judgments  of  human  response  potential  based  on  one¬ 
time  analyses  of  vocalizations  without  a  reliable  baseline,  whether  that 
baseline  Is  for  the  subject  under  analysis,  or  is  a  human  performance 
data  base.  Certainly,  for  the  combat  pilot,  the  need  for  establishing 
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FIGURE  4.  Peak  FM  distributed  by  frequency 
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a  new  baseline  would  seem  to  be  a  daily  need.  The  thesis  that  any  stress 
probably  effects  some  response,  whether  that  change  is  currently  measur¬ 
able  or  not,  is  probably  a  valid  observation.  The  thesis  that  stress 
affects  performance  in  a  detrimental  way  is  not  always  true.  To  a  degree, 
stress  can  actually  improve  task  performance,  in  some  cases.  An  organism’s 
response  to  stress  varies  considerably  from  time  to  time  and  over  a  broad 
range  of  effect.  This  variance  can  be  considerable,  yet  remain  below  a 
level  of  measurable  performance  interference.  In  fact,  in  some  instances, 
performance  may,  for  all  appearances,  remain  normal  until  the  moment  of 
catastrophic  collapse.  On  an  individual  basis,  an  organism  has  stress 
tolerance  limits  that  vary  according  to  the  daily  state  of  the  organism. 

A  pilot,  who  performs  superbly  on  a  given  mission,  may  make  frequent 
mistakes  on  essentially  the  same  mission  when  suffering  the  effects  of 
a  sleepless  night  due  to  gastric  distress,  or  a  reprimand  from  the  boss. 
Emotional  and  physical  pressures  during  combat  are  often  severe  and 
protracted.  The  emotional  effects  of  one  particularly  hazardous  mission 
may  linger  on  for  days,  or  the  cumulative  effects  of  7-day-a-weeK  opera¬ 
tions  may  finally  culminate  in  an  accident  that  is  of  a  totally  unexpected 
variety,  e.g.,  landing  gear  up.  To  be  of  any  real  value  in  a  combat 
environment,  it  would  be  necessary  to  know  at  any  given  time  where  the 
pilot's  stress  state  is  in  relation  to  his  own  baseline,  assuming  also, 
that  the  baseline  tells  something  about  the  individual's  stress  tolerance. 
In  particular,  it  would  be  most  valuable  to  know  his  state  prior  to 
mission  launch,  because  once  airborne  and  inside  enemy  territory,  recall 
on  an  individual  basis  is  unusual,  if  not  impossible.  Severe  stress  in 
aerial  combat  usually  lasts  only  a  bare  few  minutes,  sometimes  seconds, 
and  the  opportunity  to  intervene  on  behalf  of  an  overstressed  individual 
is  rare. 

The  data  collected  during  this  effort  did  not  meet  expectations 
for  at  least  several  reasons:  lack  of  sufficient  instrumentation  and 
capability  to  control  some  variables. 

The  procedure  used  in  the  measurement  of  the  speech  segments  needs 
to  be  improved.  Additional  programing  is  suggested  to  mark  and  store 
the  start  and  stop  parameters  for  each  speech  sample.  The  variation  in 
the  analysis  and  the  inconsistent  repeatability  of  the  data,  could  be  due 
to  the  lack  of  instrumentation  to  assure  that  the  analysis  of  the  voice 
signal  begins  with  the  onset  of  vocalization  and  concludes  precisely  at 
the  end  of  the  speech  segment.  It  is  recommended  that  the  voice  output 
being  digitized  on  a  disk  also  allows  the  computer  operator  to  hear  the 
speech  segments  as  they  are  processed  by  the  computer.  To  augment  this 
procedure,  an  oscilloscope  could  provide  additional  feedback  in  securing 
accurate  start  and  stop  times  for  the  vocalizations. 

There  should  be  careful  preparation  of  the  voice  output  tapes  used 
for  analysis.  Extensive  conditioning  (clean-up)  was  not  done  on  the 
tapes  analyzed  in  this  study.  A  conditioning  procedure  should  be  used  on 
all  tapes  to  be  analyzed,  even  if  to  the  ear,  clarity  of  the  tape  seems 
apparent.  A  60  Hz  hum,  frequently  found  as  background  noise  imbedded  in 
the  speech  sample,  can  filter  out  the  fundamental  (F0).  Although  caution 
must  be  taken  to  avoid  erasing  certain  overtone  series  of  the  speech 
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frequencies,  certain  hums  and  ambient  noise  can  be  removed  from  a  tape 
without  endangering  the  speech  spectrum.  It  is  probable  that  more  fil¬ 
tering  can  be  done  with  the  software  program. 

The  equipment  used  in  recording  the  tape  for  analysis  was  not  of 
the  same  instrumentation  or  quality  for  all  tapes.  Uniformity  of  equip¬ 
ment  should  be  used  in  duplicating  speech  segments.  Tape  recorders  should 
be  of  industrial  quality.  They  should  be  tested  as  to  degree  of  wow  and 
flutter  as  flutter  can  be  greater  than  FM,  thereby  masking  it  out.  These 
were  dollar  resource  problems. 

The  PDP  11/34  should  be  equipped  with  audio  playback  and  also  be 
capable  of  graphic  display  of  data.  The  12-bit  A-to-D  converter  should 
be  replaced  with  a  16-bit  A-to-D  converter;  apparently,  the  12-bit  is  not 
quite  capable  enough  for  the  job. 

Research  results  supported  the  program's  instrumentation  ability 
to  identify  the  divergency  of  the  FM  from  the  F0  and  specify  its  frequency 
component  between  5-11  Hz;  however,  there  is  insufficient  information 
present  to  firmly  relate  the  FM  activity  to  stress.  For  the  present,  at 
least,  it  does  not  appear  that  we  have  the  technology  or  insight  to  do 
meaningful  real-time  stress  analysis  of  nonlaboratory  voiced  outputs. 

In  summary,  the  hypothesis  that  a  microtremor  was  present  in  the 
voice  which  varied  in  relation  to  stress,  could  not  be  tested  because  the 
microtremor  could  not  be  found  using  the  autocorrelation  technique.  The 
failure  is  thought  to  be  due  to  noise  in  the  5-11  Hz  FM  frequency  txrPc!  of 
interest.  Pitch  was  extracted  and  peak  period  means  recorded,  which 
tended  to  shift  primarily  between  5-9  Hz.  This  was  not  relatabl*  “o  any 
subjectively  determined  level  of  stress  in  the  speaker's  voice.  In  our 
opinions,  subjective  determination  of  stress  will  be  near  impossible 
without  the  evaluator  being  familiar  with  the  speaker.  To  judge  voiced 
outputs  recorded  from  an  operational  setting  will  always  be  particularly 
difficult  because  of  the  context  of  aircraft  operations,  i.e.,  is  the 
operator  yelling  to  overcome  interference,  gain  attention,  because  he  is 
excited  or  because  he  is  scared?  So,  given  the  difficulty  of  identifying 
vocal  correlates  of  emotional  states  in  operational  environments,  it  is 
recommended  that  future  research  concentrate  on  developing  laboratory 
baselines  and  then  collecting  operational  data,  with  the  operator  reporting 
his  stress  level.  Autocorrelation  technology  may  not  be  the  vehicle  for 
analysis.  [Conversations  with  an  industry  leader  in  voice  analysis 
(Signal  Technology,  Inc.,  1982)  reports  that  statistical  summaries  of 
period  and  amplitude  are  the  best  they  can  do  on  continuous  vowels 
recorded  under  laboratory  conditions.  Their  autocorrelation  technique 
for  extracting  pitch  is  the  same  as  used  on  this  effort.] 


RECOMMENDATIONS 

A  possible  direction  for  further  research  might  be  to  work  toward 
the  development  of  a  comprehensive  FM  stress  scale.  Research  areas  to 
be  included  might  be  the  study  of  (1)  FM  characteristics  of  amplitude  and 
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periodicity,  and  (2)  the  FM  relationship  to  the  following  vocal  parameters 
of  the  speech  segment:  (a)  fundamental  frequency  (Fo)--its  median  and 
range  and,  most  importantly,  its  contour  vs.  time;  and  (b)  the  energy 
distribution  in  the  spectrum,  particularly  between  500  and  1000  Hz.  At 
this  time,  research  has  indicated  that  these  acoustical  cues  are  of  pri¬ 
mary  importance  in  the  communication  of  essential  information  regarding 
emotional  expression. 
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APPENDIX  I 


SOFTWARE  INDEX 


A)  S INGEN. FOR 

B)  VS A. FOR 

C)  VSACOM. FOR 

D)  VSACDD.FOR 

a)  .MAIN. 

b)  MAIN11 

E)  VSAATD . FOR 

a)  .MAIN. 

b)  IATD 

c)  RACMPT 

F)  IATD. MAC 

a)  IAATD 

b)  MCATD 

c)  ATD 

d)  PARMS 

G)  VSARTA. FOR 

H)  VSAADD. FOR 

a)  .MAIN. 

b)  ANAL 

I)  MYLIB.FOR 

a)  PROMPT 

b)  ISPWR2 

c)  SCROLL 

d)  CLR 

e)  CURSOR 

f)  DELAY 

g)  IIRF50 

h)  IFPMT 

i )  NYCHG 

j)  102 

k)  IIQYN 

l)  I IRQ 
0)  PAGES. FOR 

a)  PAG El 

b)  PAGE 2 

c)  PAGE 3 

d)  GRAPH 

K)  BORDER. FOR 

a)  MARQUE 

b)  BORDER 

c)  LFT 

d)  RT 

e)  ASCEND 

f)  DESCND 

L)  PTLIB.FOR* 

M)  FORLIB.FOR* 

N)  SYSLIB.FOR* 

O)  MSPLIB.FOR* 


Program  to  generate  disk  file  having  same  format  as  voice  data 
with  operator  specified  FM  frequency  and  depth  of  modulation. 
Driver  for  voice  stress  software. 

Chain  or  common  area  for  voice  stress  software.  Holds  data 
collection  and  analysis  parameters. 

Driver  and  prompting  routine  for  data  collection. 

Driver. 

Prompting  routine. 

Driver  and  supervisor  routine  for  data  collection. 

Driver  and  storage  allocator. 

Buffer  manager. 

Buffer  overrun  (completion  routine). 

Assembly  language  A-to-D  software. 

Setup  routine  (FORTRAN  callable). 

Multichannel  interrupt  routine  (not  used). 

Single  channel  interrupt  routine. 

Parameter  validation  of  FORTRAN  call. 

Real  time  voice  stress  analysis  (stub). 

Off-line  voice  stress  analysis  (data  on  disk). 

Driver. 

Data  analysis  using  array  processor. 

Prompting  for  VSAADD. FOR. 

Voice  data  analysis  prompting. 

Check  to  see  if  a  power  of  2. 

Turn  scrolling  on  or  off. 

Clear  screen. 

Position  the  cursor. 

Delay  specified  number  of  seconds. 

File  name  prompt  in  read  mode  using  cursor  positioning. 

Prompt  for  file  name  using  cursor  positioning. 

Allow  selective  re-execution  of  prompts  with  cursor  positioning. 
Prompt  for  integer  using  cursor  positioning. 

Prompt  for  yes/no  answer  using  cursor  positioning. 

Prompt  for  real  value  using  cursor  positioning. 

Output  formatting  and  statistics. 

Autocorrelation  periods  output. 

FM  content  output,  includes  moment  calculations  and  statistics. 
Output  means  and  variances. 

Rough  graph  output  of  FM  content. 

Subroutines  for  CRT  terminal  fixed  display. 

Put  up  fixed  text. 

Draw  border  around  screen. 

Draw  line  from  left  to  right. 

Draw  line  from  right  to  left. 

Draw  line  from  bottom  to  top. 

Draw  line  from  top  to  bottom. 

Previously  developed  prompting  routines  (IQ,IRT,IQYN,  etc.). 
FORTRAN  library  (SIN, COS ,SQRT,  etc.). 

RT -11  library  (PRINT, IREADW,  etc.). 

Array  processor  library  (ZRFFT,RIFT,CCMXT,  etc.). 


*  Source  listings  not  available  or  not  included. 
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TABLE  2.  Sample  of  analysis  printout  for 
peak  FM  by  frequency  bln.  Peak 
FM  for  this  30-sec  analysis  Is 
In  the  5  Hz  frequency  bln. 

AII-2 
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AUTOCORRELATION  PERIOD  MEANS 
FREQ  BINS  5-11  Hz 


mean  peak  FM  occurring  in 
the  5  Hz  frequency  bin. 
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Thrush  positive!" 
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All -8 
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340.00-369.06  0.098  0.085  0. x31  0.118  0.120  "Uh. .. they  vary. .. about  three  quarters  to  a  mile 

visibility. ..stable." 


-569.06  0.092  0.105  0.104  0. 130  0.121  "We're  at  two  thousand.  It's  not  very  clear  at  all 

right  here." 

"That’s  affirmative."  CONTINUED.. 


THIS  PERIOD  IS  SHARED  WITH  1ST  VOCALIZATION  OF  GROUND  #1 


o 


r 

i 


(, 

L 


"Can  you  climb  and  maintain  VFR,  Cessna  Three-Zero 
Golf?" 
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CONTINUED. 


AII-15 
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Uh. . .correction,  make  that  five  seconds. 


“O.K.  and  uh. . .approximately  how  long  did  you  fly  east  of 
Luling  before  you  turned  and  preceded  westbound?" 

"Roger.  Suggest  you  do  not  descend  below  three  thousand.1 
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